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1,  Introduction 


In  this  paper  we  continue  the  investigation  of  variance  reduction 
techniques  for  simulating  Markov  chains  that  was  begun  in  Heidelberger  (1977). 
Although  a brief  review  of  the  notation  and  results  of  that  paper  will 
be  given  here,  the  reader  is  assumed  to  be  familiar  with  the  contents  of 
the  previous  paper. 

Let  {X^,  n > 0)  be  an  irreducible,  aperiodic,  positive  recurrent 
Markov  chain  with  finite  state  space  E = {0,  1,  ...,  N^)  (N^  < »), 
and  transition  matrix  P = (p^j.'i,j  £ E}.  It  is  well  known  that  there 
exists  a probability  distribution  tr  = £ E)  on  E and  a random 

variable  X,  having  distribution  tt,  such  that  X^  =»  X.  tt  is  called  the 
stationary  distribution  of  the  Markov  chain  (X^,  n > 0}.  Let  f be  a 
real  valued  function  on  E and  set 

(1.1)  r = E[f(X)l  = TTf  = S tt.  f(i)  . 

i£E 

We  shall  be  interested  in  finding  r for  the  given  function  f.  To  do 
so  we  could  solve  the  system  of  stationary  equations,  ir  = ttP,  and  then 
find  r by  applying  equation  (1.1).  However,  if  the  state  space  is  very 
large  it  may  be  quite  difficult  to  solve  these  equations  numerically.  In 
this  case  it  becomes  necessary  to  estimate  r via  simulation.  It  is  the 
efficient  estimation  of  such  quantities  that  is  our  concern.  The  techniques 
developed  here  can  also  be  extended  to  continuous  time  Markov  chains  and 
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semi-Markov  processes  by  using  the  techniques  of  Hordijk^  Iglehart  and 

Schassberger  ( I976) . The  reader  is  also  referred  to  equations  (5.I9)  to 

(3.28)  of  Heidelberger  ( I977)  for  further  details  concerning  this  extension. 

As  in  the  previous  paper  we  seek  to  find  functions  f^  : E so 

that  r = TTf  = r for  each  v = 0.  1 k.  By  defining 

V V 9 7 9'^ 


(1.2) 


/V  1 ** 

X (N)  = 3:  f (X  ) , 

v'  ■'  N+1  " v'  n'^  > 

n=0 


it  is  known  that  x^(N)  -♦  r^  = r almost  surely  (a.s.)  as  N 00  for  each 
V.  Let  g be  constants  such  that 

k 

(1.3)  y.  p(v)  = 1 . 

v=o 


If  Xp(N)  is  defined  by 


(1.1^) 


MN)  = Z p(v)  X (N)  , 
^ v=0 


AOr"!:' 

N ’ 

pn'; 

"A  ■ • - 

J.;s  1 i-.'. 

1 

t . 

BY 

:)i 

_ u.,L 

then  *p(N)  -» r a.s.  as  N ->».  We  now  pick  g = g*,  where  g*  minimizes 

the  asymptotic  variance  of  Xp(N).  We  previously  studied  the  variance 

reductions  obtained  when  the  functions  f were  chosen  to  be 

V 


(1.5) 


f = P f 


V = 0,  ...,  k . 


Here  alternate  methods  for  generating  multiple  estimates  for  r = -rf 
are  considered  in  the  special  case  when  the  Markov  chain  being  simulated 
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has  a finite  state  space.  Once  the  multiple  estimates  have  been  formed^ 
variance  reductions  can  be  obtained  in  exactly  the  same  manner  as  before. 
The  functions  f^,  v = 0^  ...^  k are  now  found  by  partially  solving  an 
appropriate  system  of  linear  equations  with  some  matrix  iterative  pro- 
cedure^ such  as  Gauss-Selde 1^  and  then  estimating  the  difference  between 
the  true  and  partial  solutions  via  simulation.  The  method  therefore 
combines  Che  techniques  of  numerical  analysis  and  simulation.  Each 
different  iterative  procedure  gives  rise  to  different  functions  f^^  and 
in  some  cases  to  different  underlying  stochastic  processes  to  be  simulated. 

These  methods  are  quite  similar  to  what  are  commonly  called  Monte 
Carlo  techniques  for  solving  systems  of  linear  equations.  Monte  Carlo 
solutions  for  these  problems  were  first  suggested  by  von  Neumann  and  Ulam 
in  the  l940's^  however  the  first  published  paper  on  this  subject  did  not 
appear  until  I95O  (see  Forsythe  and  Leibler  (I950)).  There  is  a vast 
amount  of  literature  on  Monte  Carlo  methods  and  the  reader  is  referred 
to  the  books  by  Hammers  ley  and  Handscomb  (196^*),  Shreider  ( I966)  or  the 
survey  article  by  Halton  ( I97O)  for  a complete  bibliography. 

The  motivation  and  source  of  problems  in  the  classical  Monte  Carlo 
literature  are  generally  quite  different  from  those  of  stochastic  process 
simulations.  The  systems  of  linear  equations  in  the  Monte  Carlo  literature 
typically  arise  as  finite  difference  approximations  to  the  solution  of 
multidimensional  partial  differential  equations  (this  is  also  the  motiva- 
tion behind  much  of  the  work  on  matrix  iterative  procedures).  As  a 
result  many  of  the  matrices  involved  have  special  properties^  such  as 
being  positive  definite  and  symmetric.  This  type  of  structure  will 
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generally  be  absent  in  the  systems  of  equations  which  arise  in  queueing 
theory  or  other  areas  of  applied  probability.  In  addition  for  most  Monte 
Carlo  solutions  of  linear  equations^  the  underlying  stochastic  process 
to  be  simulated  arises  in  a rather  arbitrary  fashion.  Thus  simulation 
often  seems  to  be  an  unnatural  solution  techniqje  for  these  problems.  On 
the  other  hand^  the  equations  appearing  in  applied  probability  have  an 
obvious  probabilistic  interpretation  so  that  if  the  standard  numerical 
methods  for  solving  equations  are  difficult  to  apply^  simulation  then 
becomes  a very  natural  solution  technique.  It  is  emphasized  that  Simula- 


2.  Iterative  Methods 


In  this  section  a class  of  variance  reduction  techniques  for 

simulating  finite  state  space  Markov  chains  are  described.  Pick  some 

state^  called  the  return  state,  in  E,  say  0.  Set  = 0 and  for 

m > 1 let  T be  the  mth  time  the  Markov  chain  fX  , n > 0}  visits 
— m n'  — 

the  return  state  0.  Call  the  time  between  T . and  T -1  the  mth 

ID- 1 m 

cycle  and  let  t = T - T , be  the  length  of  the  mth  cycle.  Because 
the  Markov  chain  is  assumed  to  be  irreducible  and  positive  recurrent 
there  will  be,  with  probability  one,  an  infinite  number  of  visits  to  0 
and  the  expected  time  between  consecutive  visits  is  finite.  For  any 
random  variable  Y let  E^[Y]  denote  the  expectation  of  Y given  that 
Xq  = i.  Define 


(2.1) 


Ti-l 

y(i)  = E [ Z f(X^)] 
1 n=0  -I 


(2.2) 


Let  y and  t denote  column  vectors  with  ith  entries  y( i)  and  t(i) 
respectively.  It  is  then  known  (see  Crane  and  Iglehart  (1975))  that 


(2.5) 


r = TTf  = y(o)/ 1(0)  . 


Algebraic  expressions  for  y and  t have  been  given  in  Hordijk,  Iglehart, 
and  Schassberger  ( I976) . These  expressions  will  form  the  basis  of  the 
variance  reduction  techniques. 


For  any  square  matrix  A let  p(A)  denote  the  spectral  radius  of 
A;  i.e.^  p(A)  is  the  modulus  of  the  largest  eigenvalue  of  A.  By 
Theorem  5.7  of  Varga  ( I962)  the  matrix  I-A  (where  I denotes  the 
identity  matrix)  is  nonsingular  if  and  only  if  p(A)  < 1.  If  p(A)  < 1 
then 


(I-A)"^  = Z a“  , 
n=0 

and  the  infinite  series  on  the  right  hand  side  of  the  equality  converges 
(elementwise).  Now  introduce  the  taboo  probabilities 

for  j = 0 

for  j / 0 

and  let  be  the  matrix  with  entries  o**ij’  ^ nonnegative^  irreducible 

and  Z Pj 1 = 1 for  all  i so  that  by  Lemma  2.5  of  Varga  ( I962) 

jeE 

p(P)  = 1.  Since  0 < q?  < P with  qP^q  < Pj^q  for  some  i (because  P 
is  irreducible)  Lemma  2.5  of  Varga  ( I962)  implies  that  p(qP)  < 1. 

Therefore  ( I-^P)  is  nonsingular  and 

(I-oP)'^  = Z qp"  • 

n=0 
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We  are  now  ready  to  give  expressions  for  y and  t; 


00 

(2.4)  y = E f • 

n=0 

Furthermore  y satisfies  the  set  of  linear  equations 

(2.5)  y = f + 0^  • 

If  f = e,  a vector  of  ones^  we  obtain  expressions  for  t from  (2.4) 
and  (2.5). 

Equations  of  the  form  (2.5)  have  a very  special  structure  that 

lend  themselves  to  at  least  two  different  methods  of  solution.  The 

first  is  matrix  iterative  procedures.  A comprehensives  study  of  these 

methods  is  given  in  Varga  ( I962) . The  second  approach  is  Monte  Carlo 

methods  or  simulation.  In  fact  it  was  for  equations  of  exactly  this 

form  that  von  Neumann  and  Ulam  first  suggested  using  simulation. 

The  approach  taken  here  is  a middle  ground  between  these  two  methods. 

Suppose  we  have  done  k iterations  of  some  matrix  iterative  procedure 

in  an  attempt  to  solve  (2.5).  Let  y be  our  approximation  to  y after 

k k k 

the  kth  iteration  and  let  e = y-y  . e is  then  the  error  in  the  partial 

Ic  * 

solution  y . A function  g^^  can  be  defined  so  that 

Tf-l 

(2.6)  cNl)  .E.[  ^I^g^(X„)]  . 

i 


f 


1 

1 

i 


k ’ 

We  can  then  obtain  an  estimate  of  the  error  e (0)  by  setting  = 0^ 

simulating  a nuirfjer  of  independent  cycles,  and  summing  the  function  gj^  j 

^k  k 

over  each  cycle.  An  unbiased  estimate,  e (O),  for  e (0)  can  be  obtained 

and  by  setting  y (O)  = y (0)  + e (O)  we  then  have  an  unbiased  estimate 

k k 

for  y(0)  (since  y(0)  = y (0)  + e (0)).  As  more  iterations  are  performed; 

i.e.,  as  k increases,  we  expect  y to  approach  y and  g^^  to  approach 

^k 

0.  Therefore  the  variance  of  the  estimate  for  y,  y (0),  is  also  expected 
to  approach  0.  This  will  indeed  be  the  case  for  any  convergent  iterative 
method,  however  the  matter  is  complicated  by  the  fact  that  we  are  really 
interested  in  estimates  of  r = y(0)/t(0),  not  just  estimates  of  y(0). 

This  issue  will  be  addressed  later. 

We  can  improve  this  procedure  (at  the  cost  of  additional  computer 
storage)  by  saving  the  functions  g^,  ...,  gj^,  obtaining  k+1  estimates 
for  r and  then  taking  the  minimum  variance  linear  combination  as  before. 

More  specifically  let  g^,  ...,  g^^  each  satisfy  equation  (2.6).  Define 
tjv)  by 

T -1 

(2.7)  = y''(0)  + ^ > 

"=Vl 


then  Eq[Y^(v)]  = y(0).  Since  during  any  cycle  there  is  only  one  index 

n for  which  X = 0 (that  index  is  T , for  the  mth  cycle)  we  note 
n m- 1 

that 


I 


where  f Is  defined  by 
V 


(2.8) 


= 


g^(o)  + y (0) 


for  i = 0 


for  i.  ^ 0 


From  here  we  proceed  exactly  as  before.  Set  Z ( v)  = Y ( v)  - rx  . then 

m ' m'  ^ m' 

W''>’  ’ 


(2.9)  ■’ij  - - 0<‘,  Ji'' 


and  let  E,  be  the  matrix  with  entries  a...  Because  E is  finite 
k ij 

< 00  for  all  i and  j.  Ej^  is  a symmetric  positive  semidefinite 
matrix^  which  we  will  assume  is  positive  definite. 

Suppose  now  that  = 0 and  we  simulate  the  process  for  M 
(independent)  cycles.  Let 


(2.10)  ^ (M)  = Z Y (V)/  Z T , 


V = 0,  . . . , k 


Then  r (M)  ->  r a.s.  as  M -» <».  Let  B be  constants  summing  to  one 
V ^ 

and  define 


(2.11) 


rg(M)  = Z P(V)  r (M) 
^ v=0 


1 


Then  ^^(M)  a.s.  as  M » and  we  can  form  confidence  intervals 

for  r based  on  the  central  Ix^ It  theorem 


(2.12) 


where 


VH  (r  (M)  - r) 


as  M 


-gz^g'  . e(j)  . 


g is  now  chosen  to  minimize  before 


(2.15) 


(2.1U) 


-1/  „-l 


P*  = e E,  /e  E, 


k ~ 


o^O*)  = 1/e  E"^  e‘ 


where  e is  a vector  of  ones.  Let  Rj^  = ^ measure  of  the 

amount  of  variance  reduction.  This  technique  can  also  be  applied  to  the 

/S 

point  estimate  ^^(N)  defined  in  (1.5). 

We  now  derive  expressions  for  the  functions  (and  also  for 

f^  by  equation  (2.8))  for  a number  of  matrix  iterative  methods.  We 
have  studied  the  Jacobi^  Gauss-Seidel  (G-S)  and  successive  overrelaxation 
(SOR)  methods  in  addition  to  their  block  analogs.  These  methods  seem  to 
be  the  most  popular  of  the  Iterative  procedures  when  the  problem  lacks 
special  properties^  such  as  the  transition  matrix  being  symmetric  or 
positive  definite. 


Ti 


fi 

i: 

i! 


i- 

i' 

[ 
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Jacobi's  Method 

We  week  the  solution^  y,  to  y = f + Because  p(o^)  ^ ^ 

0 , 

Jacobi's  method  is  known  to  converge  (to  y)  for  any  initial  y (see 
Theorem  3.3  of  Varga  ( I962) ) . Jacobi's  iterative  procedure  is  defined 
by 

(2.15)  y*"  = f + k>l 

with  y^  given.  This  may  be  written  componentwise  as 
y\i)  = f(i)  + Z oPii  • 

j=o 

Starting  from  equation  (2.15)  we  find 

(I-oP)y‘^  = f + - 0^^ 

y*^  = (i-oP)'^f  + 

= y + (i-qP)’^  [oP(y*^'^  - y*^)]  , 

the  last  equality  being  true  by  equation  (2.4).  For  k > 1 define 
gk  by 

(2.16)  Sk  = ^ 

then 
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1 


y(i)  = y (i)  + E 


.[I 


which  is  the  desired  result.  If  y ( i)  = 0 for  all  i £ then  the 
formulas  simplify  to 


k V „n, 

y = T.  0^  ^ > 


k > 1 


®k  0 


= f 


k > 0 


Gauss-Seidel 

The  Gauss-Seidel  iterative  procedure  can  be  considered  as  an 

acceleration  to  Jacobi's  method.  In  Jacobi’s  method  only  the  values  of 
k-1  k 

y are  needed  to  generate  y ^ whereas  in  G-S  the  most  recently 
k k 

available  values  y (j)  for  j < i are  used  to  find  y ( i) , By  Theorem  5.5 
of  Varga  ( I962)  it  is  known  that  G-S  converges  at  a faster  asymptotic 
rate  than  Jacobi's  method.  The  basic  G-S  iteration  is  defined  by 


i- 1 E 

(2.17)  y*^(i)  = f(i)  + z qP..  y^j)  + Z qP..  y'^’^(j)  , k > l 

j=0  j=i 


where  y is  given  and  sums  over  empty  sets  are  considered  to  be  0.  From 
equation  {2.I7) 


y‘^(i)  = f(i)  + Z qP..  y'"(j)  + Z j)  ■ • 

j=0  j=i  ^ 


i 


Define  gj^  by 

(2.18)  g (i)  = - z - y^j))  , i e E . 

j=i 

Then  In  matrix  notation 

k k 

y = f + 0^^  - 8k 

(I.oP)y^  = f - 8k 

y*^  = y - (I-qP)'^  gj^  . 

Therefore 

Ti-1 

y(i)  = y^i)  E.[  gjX^)j  . 

As  this  is  the  form  needed  to  apply  the  variance  reduction  technique^  the 
appropriate  function  gj^  for  G-S  is  given  by  equation  (2.18). 

Block  and  Point  Iterative  Methods 

We  now  turn  to  the  study  of  block  and  point  iterative  methods. 

These  methods  have  the  interesting  property  that  the  underlying  Markov 

chain  to  be  simulated  is  generally  not  the  same  as  the  original  Markov 

chain  {X  . n > 0}.  The  new  Markov  chains  arise  as  a special  case  of  the 
' ^ n’  — 

method  to  be  presented  in  a subsequent  paper.  Suppose  we  are  again  trying 
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to  solve  y = f + ^Py.  We  now  partition  the  state  space  into  disjoint 
subsets,  or  blocks,  B^,  i = 0,  . . . , i.e.,  E = U B.  and  0 B^  = 0, 

the  empty  set,  for  i ^ j.  For  convenience  we  shall  assume  that  ®q  = fO], 
although  this  may  be  relaxed  using  the  techniques  of  the  forthcoming  paper. 
We  now  partition  f and  y into  ( f^,  ...,  f^^  ) ' and 

I ® 

^^0^  ^1^  ^ where  f^  and  y^  consist  respectively  of  elements 


f(j)  and  y(j) 

for  j € B^. 

Similarly 

0^ 

✓ 0^00 

P • « • 

0 01 

O^ON 

0^  " 

( ■” 

0^1  ••• 

p 

0 IN 

\ , 

ONgO 

p • • • 

ONgl 

o\i 

where  Q^ij  “ ^ ^ ^i-’  ^ ^ system  of  equations  then 

becomes  ( for  the  ith  block) 

N, 


(2.19) 


B 

y . = f . + Z „P. . y . 

" " jio  0 j 


Letting  denote  the  identity  matrix  with  the  number  of  rows  and 

columns  equal  to  the  number  of  elements  in  the  ith  block,  equation  (2.I9) 
can  be  simplified; 
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(2.20)  . (Ijj  - fj  * (Ij,  - „P.j)-l  „Pjj  ,j 


Define  h.  and  ^R..  by. 
i 0 ij 


(2.21) 

and 


h,  = (I..  - ^P,.)"^ 
i ' 11  0 ti^  i 


for  1 = j 


(2.22) 


o“lJ 


<'ll  - O^J 


Equation  (2.20)  then  becomes 


(2.23) 


^ - "i  » o^ij  yj  . 


or  if  qR  is  the  matrix  with  elements  o^ij  been  partitioned 


as 


15 


t 

and  if  h = (h^,  ) then  (2.25)  may  be  rewritten  as 


(2.24) 


y = h + ^Ry  . 


If  we  now  perform  Jacobi  (G-S)  Iterations  on  equation  (2.24)  we  obtain 

the  block  Jacobi  (G-S)  iterative  procedure. 

We  now  turn  to  the  probabilistic  interpretation  of  the  matrix  ^R. 

As  was  done  with  partition  P into  the  blocks  i = 0,  ..., 

and  let  P..  = (p,  . : kCB  £ € B).  Define 
ij  ^‘^k£  j 


(2.25) 


R.  . 

ij 


ij 


i = 0 


i 0,  i = J 


<'ii  - "ij-  ‘ 0-  ‘ ^ i 


is  a 
then 


- P^^  will  be  nonsingular  because  P is  irreducible)  and  let 
(R^ . 0 < i^  j < Ng).  It  is  then  easy  to  show  that  the  matrix  R 

transition  matrix;  i.e.,  if  R has  elements  r^ ^ for  i,  j ^ E, 


r . . > 0 and  ^ r . 
- j£E 


ij 

1 for  all  i G E.  The  matrix  of  taboo 


! 

! 

I 
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probabilities  obtained  by  setting  the  0th  column  of  R equal  to  0 and 
leaving  R otherwise  unchanged  is  then  equal  to  ^R.  Define  the  stopping 
time  S by 

inf{n  > 0 : 

One  can  then  show  that 


if  Xq  ^ 0 

if  Xq  € and  k 0 . 


The  numbers  d( t)  may  be  obtained  from  equation  (2.21)  when  f = e. 
Define 


and 


6» 

m 


T'-l 

m 

z 


n=T 


m- 1 


d(C„) 


f 


y;(v) 


T'-l 

y''(o)  Z 

n=T'  , 
m- 1 


«v«=n> 


Setting  have  = y(0)  ■ rt(0)  = 0. 

Let 


M 

Z 

in=l 


v;(v)/ 


M 

z 

m=l 


6'  . 
m 


We  can  now  proceed  as  before  (inserting  primes  into  the  formulas  (2.9) 
and  (2.11)  - (2.14)  whenever  necessary).  The  important  thing  to  notice 
is  that  we  are  now  simulating  the  Markov  chain  n > 0)  with  its 

transition  matrix  R rather  than  the  Markov  chain  {X  , n > 0}  with 
its  transition  matrix  P. 


Block  Jacobi 

Let  h and  ^R  be  defined  in  (2.21)  and  (2.22).  With  y^ 
given^  the  block  Jacobi  iterations  are 

y = h + ^Ry  , k > 1 . 
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The  function  which  is  derived  exactly  as  in  Jacobi's  method^  is  given 

by 

(2.28)  = - oR(y‘''^  - y'')  . 

Block  Gauss-Seidel 

Let  y^  be  given  and  let  h and  be  defined  in  (2.21)  and 

(2.22).  The  basic  block  G-S  iteration  is 

. i-1 

y^i)  = h(i)  + E y7j)  + E y "^j)  , 
j=0  ^ j=i  ^ ■' 

and  the  function  g^^  is  defined  by 

(2.29)  S^ii)  = - • 

If  the  blocks  are  chosen  to  be  singletonsj  i.e.^  if  = (i)^  then 
the  methods  are  called  point  Jacobi  and  point  Gauss-Seidel.  The  convergence 
of  these  block  iterative  methods  can  be  shown  by  using  Theorem  3.13  of 
Varga  ( I962) . Again  the  important  thing  to  emphasize  about  the  block 
methods  is  that  an  entirely  different  Markov  chain  must  be  simulated  to 
apply  the  technique. 

Successive  Overrelaxation 

Successive  overrelaxation  may  be  thought  of  as  an  acceleration 
to  Gauss-Seidel.  Let  cu,  the  relaxation  factor,  be  given.  In  order  for 
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SOR  to  converge  it  is  necessary  that  0 < o)  < 2,  (see  Theorem  3-5  of 

Varga  (1962))^  although  this  is  not  a sufficient  condition  (co  = 1 

corresponds  to  G-S) . However^  the  method  does  converge  for  0 < co  < 

where  1 < < 2.  Actually  for  our  purposes  it  is  not  really  crucial 

that  CO  be  restricted  to  this  interval  since  for  any  value  of  co  and 

any  finite  number  of  iterations  the  procedure  defines  valid  functions 

for  the  simulation.  Much  work  has  been  done  on  trying  to  find  the 

optimum  relaxation  factor^  cOj^;  i.e.,  trying  to  find  that  value  of  co 

which  maximizes  the  asymptotic  rate  of  convergence  of  the  iterative 

procedure.  In  our  case,  finding  co^  is  not  so  critical  since  we  are 

interested  in  minimizing  a variance  and  not  in  maximizing  the  rate  of 

convergence  of  y to  y,  two  entirely  different  matters.  What  would  be 

more  appropriate  here  is,  for  a fixed  number  of  iterations  k,  to  find  the 

2 

value  of  CO,  say  cc^,  that  minimizes  the  optimal  variance,  cTj^O*)  . To  do 
so  theoretically  would  be  very  difficult  indeed  so  it  is  recommended  that 
be  estimated  from  relatively  short  preliminary  simulation  runs. 


The  SOR  iterative  method  requires  the  diagonal  elements  of 
to  be  0.  If  p^j,  > 0 for  some  i,  partition  E into  blocks  = (i) 

as  in  the  previous  section.  As  a special  case  of  equation  (2.25),  we 
obtain  a transition  matrix  R with  entries 


I 

i 


and  let  be  defined  Co  be  the  matrix  with  entries 


j = 0 


j 0 . 


R has  all  diagonal  elements  equal  to  0.  Let  h be  defined  as 


(2.30) 


h(i)  = 


_iai 


i = 0 


i 0 , 


which  is  a special  case  of  equation  (2.21).  Then  y = h + 

are  able  Co  perform  SOR  iterations  on  this  system  of  equations.  With 

y^  given^  the  basic  SOR  iteration  is 

y\i)  = (1-0))  y’^'^i)  + Jh(i)  ^ r y^j)  + Z gr  y^'^j)]. 

L j=0  ■'  j = i+l  J J 


To  obtain  g ( i)  write 


E ”e 

y‘^(i)  = (1-0))  y*^"^(i)  +ofh(i)  + Z qC  y^(j)  + Z Qr  (y*^"^(j)-y*^(j))l 

L j=0  ^ j=i+l  J 

= (1-0))  y*^"^  + o)[h  + R y*^  + b ] 


where 
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(2.31) 


Mi)  = ^ 

^ j=Ul 


Further  simplification  yields 


y^.  (1  . Jl)-\  - (I  - 


- y - (I  - o«)'  H 


with  gj^  defined  by 


(2.32) 


r,  ( 1-cu)  , k-1  T 


We  then  have 


T|-l 

(2.35)  y(i)  = y^i)  + e[  = i] 


where  {C  , n > 0}  is  a Markov  chain  with  transition  matrix  R.  Multiple 

estimates  for  r are  then  produced  in  the  same  manner  as  for  the  block 

iterative  methods  (d(i)  is  obtained  from  (2.30)  with  f(  i)  = 1).  For 

block  SOR  formulas  (2.32)  and  (2.53)  remain  valid  provided  h and  ^R 

are  defined  as  in  (2.21)  and  (2.22).  Theorem  5.15  of  Varga  ( I962)  implies 

that  block  SOR  converges  for  0 < co  < 1,  and  therefore  by  the  continuity 

of  eigenvalues  for  0 < cu  < co  where  1 < cu  <2. 

° max  max  — 


I 


3.  Numerical  Considerations 

In  this  section  we  discuss  some  of  the  problems  that  might  be 
encountered  in  the  implementation  of  these  methods  and  present  numerical 
results  for  a particularly  simple  Markov  chain^  the  queue  length  process 
in  a finite  capacity  m/m/1  queue. 

The  first  problem  that  one  faces  with  these  methods  is  choosing  a 
return  state.  As  Table  1 indicates,  the  variance  reductions  can  vary 
dramatically  as  the  return  state  changes.  This  is  in  contrast  to  the 
previous  method  of  picking  f^  = p'^f  to  generate  the  multiple  estimates 
for  r.  For  that  particular  choice  of  functions,  the  variance  reductions 
can  be  shown  to  be  independent  of  the  return  state.  Preliminary  simulation 
runs  could  be  used  to  determine  a good  return  state.  It  seems  likely  that 
frequently  occuring  states  (or  equivalently  states  with  relatively  short 
expected  cycle  lengths)  will  be  the  best  candidates  for  return  state. 

The  iterative  methods  also  assume  that  the  states  are  ordered  in 
some  manner.  Very  frequently  in  simulations  each  state  represents  a 
vector,  e.g.,  a state  may  represent  the  queue  lengths  at  various  service 
centers  in  a network  of  queues.  Since  there  are  many  ways  to  order  the 
states  and  each  different  ordering  gives  rise  to  different  functions 
f^,  ...,  fj^  (and  therefore  different  variance  reductions),  the  simulator 
must  take  care  to  use  an  ordering  that  will  give  good  variance  reductions. 
Again  this  problem  is  not  encountered  when  the  simulator  chooses  the 
functions  f^  = P f;  in  that  case  the  order  of  the  states  is  irrelevant. 

Tables  2 through  6 give  calculated  variance  reductions  for  estimating 
the  expected  stationary  queue  length  in  the  finite  capacity  M/M/1  queue 
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MCiS”; 


A 


for  a variety  of  the  methods  discussed  (this  continuous  time  problem  has 


been  transformed  into  discrete  time  using  the  techniques  of  Hordijk^ 

Iglehart  and  Schassberger  ( I976) ) . The  block  Gauss-Seidel  method  (Table  6) 

gives  the  best  variance  reductions  for  all  values  of  the  traffic 

intensity.  The  figures  for  this  method  include  the  variance  reduction 

that  is  obtained  by  simulating  the  Markov  chain  {C  n > 0}  rather  than 

n ~ 

2 

fX  , n > 0}.  That  variance  reduction  is  reflected  in  the  R values 
n’  — ' 0 

and  will  be  discussed  at  greater  length  elsewhere.  Due  to  the  work  involved 
in  forming  the  transition  matrix  for  the  chain  ^ OJ,  block  G-S 

requires  more  computation  to  be  done  before  the  simulation  than  the  other 
methods.  These  extraordinarily  good  results  for  block  G-S  may  not  be 
typical  because  of  the  very  special  structure  of  the  m/m/I  queue.  Observe 
that  each  of  the  other  methods  is  capable  of  producing  substantial  ] 

variance  reductions,  although  none  dominates  the  others  for  all  values 
of  p.  It  is  perhaps  somewhat  surprising  that  for  SOR  (see  Table  5),  the 
value  of  o)  does  not  have  that  great  an  impact  on  the  variance  reductions. 

This  is  probably  because  the  variance  reduction  results  primarily  from 

/N  /N 

the  high  correlation  between  the  estimates  rQ(M),  ...,  tj^(M)  and  not 

from  how  well  the  iterative  procedure  performs;  i.e.,  not  from  how  close 

k 0 

y is  to  y.  In  fact  for  p = ,9  and  y =0  G-.  requires  over  200 

iterations  to  make  |!y*^-y  11/ lly  11  < 0.01  where  H-H  denotes  the  Euclidean 

norm.  For  small  values  of  k,  y and  y tend  to  differ  substantially. 

The  major  difficulty  with  using  these  methods  will  be  computing 

and  storing  the  functions  f^.  Unless  the  transition  matrix  is  quite 

sparse  the  amount  of  work  involved  in  computing  f^,  •••>  even 

2U 


1 


small  values  of  k will  probably  be  too  great  to  justify  the  use  of  the 

V 

methods.  The  work  involved  in  generating  f^  = P f is  slightly  less 
than  that  required  to  form  f^  for  G-S.  To  perform  k G-S  iterations 
requires  almost  exactly  the  same  amount  of  work  as  forming  f^  Pf^  •••}  ^ 
However  at  the  end  of  the  kth  G-S  iteration  one  must  also  calculate 

Z qPh  y (j)  i*'  order  to  compute  f ( i)  . For  the  block  methods  additional 

j = l ^ ^ 

computation  must  be  done  to  form  the  necessary  transition  matrix  R and 
functions  h and  d.  Once  these  have  been  formed  the  work  needed  for 
block  Jacobi  (G-S)  is  about  the  same  as  for  Jacobi  (G-S), 

Since  all  of  the  iterative  methods  we  have  studied  are  convergent 
(for  at  least  some  values  of  oj  in  the  case  of  SOR)^  y y as  k 

By  examining  the  functions  gj^  for  each  method  it  is  seen  that  g^^  ->  0 
as  k ^ oo.  Recalling  that 


o = var(Z  (k))  = var(Y  (k)  - rx  ) 


= var(y  (O)  + Z 8^^^)  " f 

2 2 

it  can  be  shown  that  ->  r var(r^)^  which  will  in  general  be  positive. 

2 2 

In  fact  for  many  Markov  chains  r var( t^)  > due  to  the  high  positive 

/s 

correlation  between  Y (0)  and  x . Thus  for  large  values  of  k.  r,  (M) 

m m ° ' k'  •' 

tends  to  be  a more  variable  point  estimate  for  r than  rQ(M).  In  fact 

2 

for  the  iterative  methods  we  have  not  been  able  to  show  that  a,^0^^) 

converges  to  0 as  k This  again  contrasts  to  the  case  when 

k 2 2 

f = P f where j for  a finite  state  space^  a,  and  a (p*)  both  converge 
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to  0.  In  applications^  however^  k will  usually  be  small  so  that  for 
the  iterative  methods  this  property  should  not  prove  troublesome. 


We  have  so  far  considered  performing  iterations  on  the  set  of 
equations  y = f + qPY,  which  enables  the  formation  of  multiple  estimates 
for  the  numerator  of  r = y(0)/t(0).  It  is  also  possible  to  apply  the 
method  to  the  equations  t = e + thereby  allowing  the  formation  of 

y 

multiple  estimates  for  the  denominator  of  r.  Letting  t denote  our 
approximation  to  t after  v iterations  we  could  find  functions  e 


such  that 


Tf-l 

t(i)  = t'’(i)  + E [ Z %(X)1 
n=0  “ J 


\(^)  = t (0)  + z , 


then  done  and  kg  iterations  on  y and  t 

/N 

respectively  we  could  obtain  (k^^+l)  (kg+l)  point  estimates^ 


r where 


nfcl  TIl=:l 


ki  kg 

E Z 3(iJ)  = 1 

i=0  j=0 


and  define 
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.s,  12 

r (M)  = E E P(i,j)  r (M) 
“ i=0  j=0 


We  could  then  pick  the  constants  p(i^j)  to  minimize  the  asymptotic 

variance  of  . To  carry  this  out  in  practice  we  would  need  to 

estimate  a (k^+l)(k2+l)  by  ( kj^+1) ( k^+l)  dimensional  covariance  matrix. 

Because  the  estimates  r^j(M)  will  usually  be  quite  highly  correlated^ 

this  covariance  matrix  is  likely  to  be  very  ill  conditioned^  even  for 

2 

moderate  values  of  k^  and  kg.  Thus  and  a (P*)  are  likely 

to  be  difficult  quantities  to  estimate.  Table  7 gives  variance  reductions 
for  the  finite  capacity  M/m/ 1 queue  when  k^^  = 2 and  kg  = 0^  ^ 

the  Gauss-Seidel  iterative  procedure.  For  large  values  of  the  addi- 
tional variance  reductions  obtained  by  performing  iterations  on  t are 
hardly  enough  to  justify  the  use  of  multiple  estimates  for  the  denominator^ 
particularly  considering  the  probable  difficulty  in  estimating  the 
covariance  matrix.  Of  course  for  different  functions  f it  may  be  more 
appropriate  to  use  multiple  estimates  for  the  denominator  rather  than 
the  numerator  of  r (for  example  in  estimating  the  stationary  probability 
of  a particular  state). 
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TABLE  1 


Effect  of  Return  State  on  Variance  Reductions  for 
Finite  Capacity  M/M/1  Queue  Using  Gauss-Seidel: 

r = E(X),  y°  = 0 


p 

Return 

State 

"l 

4 

"2 

.5 

0 

.0720 

.0408 

.0207 

.2682 

.2021 

.1440 

.5 

2 

.2222 

.1376 

.0557 

.4715 

.3709 

.2361 

.5 

4 

.5756 

.5680 

.5674 

.7587 

.7336 

.7532 

.5 

6 

.5J?25 

.5034 

.5002 

.7229 

.7095 

.7073 
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TABLE  2 


Variance  Reductions  for  Finite  Capacity  M/M/ 1 Queue  Using 

f = p''f  : r = E(X) 

V ' 


2 

2 

„2 

*^2 

p 

•^2 

*^3 

.25 

.1168 

.0292 

.0075 

.5417 

.1709 

,0854 

.50 

.25UI 

.1121 

.0524 

.4858 

.3347 

.2288 

.75 

.5328 

.1440 

.0663 

.5769 

.5794 

.2574 

.90 

.6050 

.2659 

.1148 

.7778 

.5156 

.3388 

.95 

.6880 

.5425 

.1607 

.8294 

.5852 

.4009 

.99 

.7404 

.4056 

.2047 

.8605 

.6569 

.4525 
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TABLE  3 


Variance  Reductions  For  Finite  Capacity  m/m/1 
Queue  Using  Jacobi's  Method: 

r = E(X)^  y =0^  Return  State  = 0 


p 

R2 

^3 

.25 

.1168 

.0291 

.0075 

.5417  . 

. 1710 

.0855 

.50 

.'2541 

.1289 

,0645 

.4858 

.5590 

.25110 

.75 

.5552 

.2029 

.1574 

.5772 

.4504 

.3967 

.90 

.6043 

.^72 

.1468 

.7774 

.5266 

.5831 

.95 

.6878 

.5465 

.1732 

.8291+ 

.5886 

.4161 

• 99 

.7405 

.4048 

.2087 

.8605 

.6562 

.4569 

50 


TABLE  4 


Variance  Reductions  for  Finite  Capacity  M/M/l  Queue 
Using  Gauss-SeideL: 

0 

r = E(X)  y =0  Return  State  = 0 


p 

*^2 

^3 

.25 

.0099 

.0015 

.0002 

.0997 

.0390 

.0121 

.50 

.0720 

.0408 

.0207 

.2682 

.2021 

.1440 

.75 

.2720 

.1497 

.0813 

.5215 

.3869 

.2351 

.90 

.5959 

.3881 

.1834 

.7706 

.6230 

.4283 

.95 

.6789 

.4794 

.2414 

.8239 

.6924 

.4914 

.99 

.7321 

.^hk6 

.2963 

.8556 

•7380 

.5443 
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TABLE  5 


r 


Variance  Reductions  for  Finite  Capacity  M/M/1 
Queue  Using  SOR:  > 


r = E(X), 

•>* 

0 

II 

Return  State  = 

0 

n2 

1.2 

Rg 

p 

CO 

.5 

.75 

.1217 

.0341 

.0168 

.3489 

.1847 

.1296 

.5 

1.00 

.0720 

.0408 

.0207 

.2682 

.2021 

.1440 

.5 

1.25 

.0268 

.0246 

.0212 

.1638 

.1567 

.1455 

.5 

1.50 

.0151 

.0080 

.0056 

.1250 

.0893 

.0747 

.5 

1.75 

. 1096 

.0261 

.0091 

.3310 

.1616 

.0955 

.9 

.75 

.5951 

.3484 

.1643 

.7714 

.5903 

.4054 

.9 

1.00 

• 5939 

.3881 

.1834 

.ncrj 

.6229 

.4282 

.9 

1.25 

.5987 

.4379 

.2155 

•7738 

.6618 

.4642 

.9 

1.50 

.6174 

.5007 

.2686 

.7858 

.7076 

.5183 

.9 

1.75 

.6594 

.5786 

.3547 

.8120 

.7607 

.5955 

32 
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TABLE  6 


Variance  Reductions  for  Finite  Capacity 
Queue  Using  Block  G-S; 

r = E(X)^  = 0,  Return  State  = 0,  Block  Sizes  = ( 1,5, 5, 5, 3, 2) 


p 

^0 

^0 

‘‘l 

4 

^2 

^3 

.25 

.2942 

.0003 

1.22  X 10"^ 

4.17  X lO"^ 

.5^+24 

.0172 

.0035 

.0020 

.50 

.4441 

.0075 

.0026 

1.32  X 10"5 

.6664 

.0864 

.0512 

.0036 

.75 

.4057 

.0116 

.0112 

1.56  X 10"^ 

.6355 

.1080 

.1061 

.0117 

.90 

.5624 

.0397 

.0133 

.0015 

.6020 

.1992 

.1155 

.0560 

.95 

.3510 

.0543 

.0127 

.0052 

.5925 

.2331 

.1150 

.0567 

.99 

.5429 

.0662 

.0127 

5.25  X 10"^ 

.5856 

.2574 

.1126 

.0072 
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TABLE  7 

Variance  Reductions  For  Finite  Capacity  m/m/1  Queue 
Using  Multiple  Estimates  for  Numerator  and  Denominator  and  G-S: 
, . 0 0 

r = E(X)j  y = 0,  t =0^  Return  State  = 0 


p 

R(2,0)^ 

R(2,0) 

R(2,l)^ 

R(2, 1) 

R(2,2)^ 

R(2,2) 

.25 

.0015 

2.80  X lO'^ 

2.78  X lO"^ 

.0590 

.0055 

.0052 

•50 

.0401 

.0092 

.0033 

.2003 

.0961 

.(^79 

.75 

.1498 

.1238 

.1237 

.5870 

.5518 

.5518 

.99 

.5448 

.5366 

.5196 

.7581 

.7326 

.7208 

2 

R(kj^^k2)  = Variance  reduction  when  kj^(k2)  G-S 

iterations  are  performed  on  tne  numerator 
( denominator) 


3** 


r 

t 
i 
♦ 

i 
1 

4 . Reconmendations 

j We  have  presented  a class  of  variance  reduction  techniques  for 

, estimating  functionals  of  the  stationary  distribution  of  Markov  chains. 

The  methods  all  require  additional  computation  to  be  done  both  before  and 
during  the  simulation^  but  if  the  variance  reduction  obtained  is  large 
enough  an  overall  decrease  in  computation  can  be  achieved.  The  methods 
are  likely  to  be  computationally  efficient  when  the  transition  matrix  of 
the  Markov  chain  is  relatively  sparse.  It  is  impossible  to  say  a priori 
which  method  will  yield  the  best  variance  reduction  for  a given  Markov 
chain  that  is  to  be  simulated.  The  simulator  is  advised  to  experiment  with 
the  methods  on  short  preliminary  runs  to  get  estimates  for  the  variance 
reductions.  Based  on  these  estimates  a particular  method  (or  methods^ 
one  need  not  be  limited  to  only  one  method)  could  be  chosen.  If  pre- 
liminary experimentation  is  impossible  it  is  then  suggested  that  the 

y 

functions  f^  = P f be  used  to  generate  the  multiple  estimates.  While 
this  may  not  always  produce  the  best  variance  reductions  possible^  it  is 
likely  to  be  the  most  reliable  and  easily  implemented  method.  The 
reliability  of  this  method  stems  from  the  fact  that  the  variance  reductions 
are  independent  of  the  manner  in  which  the  states  are  ordered  and  which 
return  state  is  chosen  for  the  simulation. 
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